Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer

نویسندگان

  • Yuchen Huang
  • Zhiyong Wu
  • Runnan Li
  • Helen M. Meng
  • Lianhong Cai
چکیده

Prosodic structure generation from text plays an important role in Chinese text-to-speech (TTS) synthesis, which greatly influences the naturalness and intelligibility of the synthesized speech. This paper proposes a multi-task learning method for prosodic structure generation using bidirectional long shortterm memory (BLSTM) recurrent neural network (RNN) and structured output layer (SOL). Unlike traditional methods where prerequisites such as lexicon word or even syntactic tree are usually required as the input, the proposed method predicts prosodic boundary labels directly from Chinese characters. BLSTM RNN is used to capture the bidirectional contextual dependencies of prosodic boundary labels. SOL further models correlations between prosodic structures, lexicon words as well as part-of-speech (POS), where the prediction of prosodic boundary labels are conditioned upon word tokenization and POS tagging results. Experimental results demonstrate the effectiveness of the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study on BLSTM-RNN-based Chinese Prosodic Structure Prediction in a Unified Framework with Character-level Features

In Text-to-Speech system, prosodic attributes have to be predicted only from input text. The accuracy of prosody prediction has a significant effect on the naturalness of synthesized speech of Chinese. In this paper, we explore using neural networks to predict prosodic boundaries from Chinese text without task specific knowledge or sophisticated feature engineering. We examine sequence characte...

متن کامل

Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network

Bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) has been successfully applied in many tagging tasks. BLSTM-RNN relies on the distributed representation of words, which implies that the former can be futhermore improved through learning the latter better. In this work, we propose a novel approach to learn distributed word representations by training BLSTM-RNN on a spe...

متن کامل

RNN-BLSTM Based Multi-Pitch Estimation

Multi-pitch estimation is critical in many applications, including computational auditory scene analysis (CASA), speech enhancement/separation and mixed speech analysis; however, despite much effort, it remains a challenging problem. This paper uses the PEFAC algorithm to extract features and proposes the use of recurrent neural networks with bidirectional Long ShortTerm Memory (RNN-BLSTM) to m...

متن کامل

Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information

In recent years, neural network based acoustic-to-articulatory inversion approaches have achieved the state-of-the-art performance. One major issue associated with these approaches is the lack of phone sequence information during inversion. In order to address this issue, this paper proposes an improved architecture hierarchically concatenating phone classification and articulatory inversion co...

متن کامل

Handwritten Nastaleeq Script Recognition with BLSTM-CTC and ANFIS method

A recurrent neural network (RNN) has been successfully applied for recognition of cursive handwritten documents, both in English and Arabic scripts. Ability of RNNs to model context in sequence data like speech and text makes them a suitable candidate to develop OCR systems for printed Nastaleeq scripts (including Nastaleeq for which no OCR system is available to date). In this work, we have pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017